Genetic insights into ossification of the posterior longitudinal ligament of the spine

Ossification of the posterior longitudinal ligament of the spine (OPLL) is an intractable disease leading to severe neurological deficits. Its etiology and pathogenesis are primarily unknown. The relationship between OPLL and comorbidities, especially type 2 diabetes (T2D) and high body mass index (BMI), has been the focus of attention; however, no trait has been proven to have a causal relationship. We conducted a meta-analysis of genome-wide association studies (GWASs) using 22,016 Japanese individuals and identified 14 significant loci, 8 of which were previously unreported. We then conducted a gene-based association analysis and a transcriptome-wide Mendelian randomization approach and identified three candidate genes for each. Partitioning heritability enrichment analyses observed significant enrichment of the polygenic signals in the active enhancers of the connective/bone cell group, especially H3K27ac in chondrogenic differentiation cells, as well as the immune/hematopoietic cell group. Single-cell RNA sequencing of Achilles tendon cells from a mouse Achilles tendon ossification model confirmed the expression of genes in GWAS and post-GWAS analyses in mesenchymal and immune cells. Genetic correlations with 96 complex traits showed positive correlations with T2D and BMI and a negative correlation with cerebral aneurysm. Mendelian randomization analysis demonstrated a significant causal effect of increased BMI and high bone mineral density on OPLL. We evaluated the clinical images in detail and classified OPLL into cervical, thoracic, and the other types. GWAS subanalyses identified subtype-specific signals. A polygenic risk score for BMI demonstrated that the effect of BMI was particularly strong in thoracic OPLL. Our study provides genetic insight into the etiology and pathogenesis of OPLL and is expected to serve as a basis for future treatment development.

table disease leading to severe neurological deficits. Its etiology and pathogenesis are primarily unknown. The relationship between OPLL and comorbidities, especially type 2 diabetes (T2D) and high body mass index (BMI), has been the focus of attention; however, no trait has been proven to have a causal relationship. We conducted a meta-analysis of genome-wide association studies (GWASs) using 22,016 Japanese individuals and identified 14 significant loci, 8 of which were previously unreported. We then conducted a gene-based association analysis and a transcriptomewide Mendelian randomization approach and identified three candidate genes for each. Partitioning heritability enrichment analyses observed significant enrichment of the polygenic signals in the active enhancers of the connective/bone cell group, especially H3K27ac in chondrogenic differentiation cells, as well as the immune/hematopoietic cell group. Single-cell RNA sequencing of Achilles tendon cells from a mouse Achilles tendon ossification model confirmed the expression of genes in GWAS and post-GWAS analyses in mesenchymal and immune cells. Genetic correlations with 96 complex traits showed positive correlations with T2D and BMI and a negative correlation with cerebral aneurysm. Mendelian randomization analysis demonstrated a significant causal effect of increased BMI and high bone mineral density on OPLL. We evaluated the clinical images in detail and classified OPLL into cervical, thoracic, and the other types. GWAS subanalyses identified subtype-specific signals. A polygenic risk score for BMI demonstrated that the effect of BMI was particularly strong in thoracic OPLL. Our study provides genetic insight into the etiology and pathogenesis of OPLL and is expected to serve as a basis for future treatment development.

Editor's evaluation
This study builds on previous work to explore the genetic causes of ossification of the posterior longitudinal ligament of the spine (OPLL). A meta-Genome wide association study is conducted to increase detection power and a disease subtype analysis is completed that provides new information on if all sites of OPLL have uniform causes. Using additional open-source data, the GWAS results are explored further to find putatively causative genes and to explore causative co-existing conditions such as obesity for OPLL. Overall this study is by far the most complete genetic exploration of this disease to date and is instructive for future studies that will lead to treatments for this condition.

Introduction
Ossification of the posterior longitudinal ligament of the spine (OPLL) is an incurable disease with progressive heterotopic ossification. It can occur at any spine level from the cervical to the lumbar spine, and ossified ligaments compress the spinal cord and roots, leading to a severe neurological deficit (Matsunaga and Sakou, 2012). OPLL is a common disease; however, its frequency varies depending on the region of the world; high in Asian countries (0.4-3.0%), especially Japan (1.9-4.3%), compared with Europe and the United States (0.1-1.7%) (Matsunaga and Sakou, 2012;Ohtsuka et al., 1987;Sohn and Chung, 2013;Yoshimura et al., 2014). However, its etiology and pathogenesis remain unknown. Histological studies suggest that OPLL develops through endochondral ossification (Sato et al., 2007;Sugita et al., 2013). In recent years, OPLL has been reported to have different clinical characteristics depending on the affected region: higher body mass index (BMI), earlier-onset of symptoms, and more diffuse progression of OPLL over the entire spine in the thoracic type of OPLL (T-OPLL) than in the cervical type (C-OPLL) (Endo et al., 2020;Hisada et al., 2022). This fact suggests that there may be differences in etiology and pathogenesis for each subtype of OPLL. Currently, there is no therapeutic or preventive measure for OPLL other than surgery to decompress the spinal cord and roots. Therefore, it is necessary to clarify its etiology and pathogenesis to develop effective measures to prevent and treat OPLL.
OPLL is assumed to be a polygenic disease where complex genetic and environmental factors interact. Epidemiological studies have reported the relationship between OPLL and various other traits, especially type 2 diabetes (T2D) (Akune et al., 2001;Kobashi et al., 2004), high BMI (Hou et al., 2017;Kobashi et al., 2004), low inorganic phosphate, X-linked hypophosphatemic rickets (Chesher et al., 2018), and increased C-reactive protein (Kawaguchi et al., 2017). Of these traits, T2D has been the focus of attention for a long time (Akune et al., 2001;Kobashi et al., 2004). Furthermore, because of the high incidence within families and close relatives in previous epidemiological studies, genetic factors have long been considered in OPLL development Sakou et al., 1991;Terayama, 1989), although there are no previous papers evaluating the heritability of OPLL such as twin studies. To understand the genetic factors associated with OPLL, we previously conducted a genome-wide association study (GWAS) and found six significant loci (Nakajima et al., 2014). In subsequent in silico and in vitro functional studies, we identified RSPO2 as a susceptibility gene for OPLL, and the role of Wnt signaling in the pathogenesis of OPLL was clarified (Nakajima et al., 2016). However, the pathogenesis of this condition remains largely unknown.
In this study, to clarify the etiology and pathogenesis of OPLL, we conducted a meta-analysis of GWASs and various post-GWAS analyses. We identified 14 significant loci, including 8 previously unreported susceptibility loci. Using a gene-based analysis (de Leeuw et al., 2015) and summary data-based Mendelian randomization (SMR) (Zhu et al., 2016), we identified three candidate genes for each. Using a genetic correlation analysis and a subsequent Mendelian randomization (MR) study, we identified a causal effect of high BMI on OPLL. A polygenic risk score (PRS) of BMI demonstrated the heterogeneity of the impact of obesity on OPLL subtypes.

Novel susceptibility loci in OPLL
We conducted three GWASs (set 1-3) in the Japanese population (Supplementary file 1). After quality control of single-nucleotide polymorphism (SNP) genotyping data, we performed imputation and association analyses independently for each GWAS. Subsequently, we performed a fixed-effects metaanalysis combining the three GWASs (ALL-OPLL: a total of 2010 cases and 20,006 controls; Figure 1figure supplement 1) and identified 12 genome-wide significant loci (p<5.0 × 10 −8 ) (Figure 1). The genomic inflation factor (λGC) was 1.11 and showed slight inflation in GWAS; however, the intercept in linkage disequilibrium (LD) score regression  was 1.03, indicating that inflation of the statistics was mainly from polygenicity and minimal biases of the association results ( Figure 1-figure supplement 2).
Next, we conducted a stepwise conditional analysis to detect multiple independent signals. We detected two additional independent signals that showed genome-wide significance after conditioning (Supplementary file 2): rs35281060 (12p12.3, p=1.04 × 10 −10 ) and rs1038666 (12p11.22, p=2.37 × 10 −10 ) (Figure 1-figure supplement 3F-I). We also detected one additional signal (rs61915977, 12p11.22 p=1.39 × 10 −6 ) that reached locus-wide significance (p<5.0 × 10 -6 ) (Supplementary file 2). Thus, the meta-analysis and conditional analysis identified 14 genome-wide significant OPLL loci, including 8 novel loci. Significant associations of the six previously reported loci (Nakajima et al., 2014) were observed in the present study (Table 1, Figure 1, Figure 1-figure supplement 3). The estimated proportion of the phenotypic variance explained by all the variants used in the study was                  , indicating that OPLL has a high heritability. The lead variants of the 14 loci explained 6.5% of the phenotypic variance. Together with the LD score regression results, OPLL is a highly polygenic disease. Adjacent to lead variants in the novel loci, we found several candidate genes ( Figure 1) reported to be related to osteogenesis and could be connected to OPLL development. TMEM135 (transmembrane protein 135), a gene in the newly identified significant locus (11q14.2), is a multi-transmembrane protein with seven transmembrane helices of high confidence. It is more strongly expressed in multipotent adipose tissue-derived stem cells committed to osteoblastic cells than the adipogenic lineage (Scheideler et al., 2008). WWP2 (WW domain-containing E3 ubiquitin-protein ligase 2), the nearest gene to rs376989376 (the lead SNP in 16q22.1), was recently reported to serve as a positive regulator of osteogenesis by augmenting the transactivation of RUNX2, a master regulator of osteoblast differentiation as well as for chondrocyte maturation during skeletal development (Zhu et al., 2017).
All lead SNPs and SNPs in high LD (r 2 > 0.8) with them in previously unreported significant loci were in intron or intergenic regions, and none of them were exonic variants (Supplementary file 3). To prioritize putative causal variants, we conducted a Bayesian statistical fine-mapping analysis for significant loci using FINEMAP (Benner et al., 2016). The lead SNPs had the highest posterior probability (PP) in any significant region, and two of them were higher than 0.5: rs4665972 (2p23.3, p=0.548) and rs1038666 (12p11.22, p=0.533) (Supplementary file 4).

Statistical power analysis
We examined the statistical power for minor allele frequency (MAF) and odds ratio of lead SNPs within the 14 independent significant regions in GWAS meta-analysis for ALL-OPLL. The results showed that all had a power greater than 0.5 for a significance level of p-value = 5 × 10 −8 , and nine had a power greater than 0.8 ( Figure 1-figure supplement 4).

Enrichment in genes involved in bone metabolism
We conducted a gene set enrichment analysis implemented in FUMA (Watanabe et al., 2017). We found significant enrichment in the set related to bone mineral density (BMD): BMD of the heel (p=8.60 × 10 −8 ), pediatric lower limb (p=9.24 × 10 −5 ), and pediatric total body less head (p=2.68 × 10 −4 ) (Supplementary file 5), compatible with the critical roles of bone metabolism in OPLL. However, we observed no significant enrichment in BMD in adults measured by dual-energy X-ray absorptiometry in this analysis.

Identification of novel candidate genes missed by the GWAS metaanalysis
To identify other candidate genes, we conducted a gene-based association analysis (de Leeuw et al., 2015;Watanabe et al., 2017). We found three additional genes significantly associated with OPLL: EIF3E, EMC2, and TMEM135 ( Figure 2, Supplementary file 6). EIF3E and EMC2 are in the same locus most strongly associated with OPLL as RSPO2 (8q23.1.). EIF3E (eukaryotic translation initiation factor 3 subunit E) encodes a protein that is a component of the eukaryotic translation initiation factor 3 (eIF-3) complex, which functions in and is essential for several steps in the initiation of protein synthesis (Lee et al., 2015;Masutani et al., 2007). A proteomics study in a rat model of heterotopic ossification reported that Eif3e was upregulated in ossified tissues and may be involved in tissue ossification by regulating hypoxia-inducible factor (HIF) signaling, which has an important role in osteogenesis (Wei et al., 2022). EMC2 (endoplasmic reticulum membrane protein complex subunit 2) encodes a part of the endoplasmic reticulum membrane protein complex (EMC) that functions in the energy-independent insertion of newly synthesized membrane proteins into the endoplasmic reticulum membrane, an essential cellular process (Chitwood et al., 2018;O'Donnell et al., 2020). However, basic experiments evaluating the effects of EMC2 on ligament and bone tissue have not been reported, and the mechanisms involved in OPLL are unknown. On the other hand, this analysis reinforced the possible involvement of TMEM135 in the development of OPLL.
The lack of exonic variants suggests that altering gene expression levels is a key function of OPLLassociated variants. By searching expression quantitative trait loci (eQTL) data in all available tissues in GTEx (Consortium, 2015), we found 26 transcripts with cis-eQTL variants associated with OPLL signals; of these, 20 transcripts were in the novel loci (Supplementary file 7). Furthermore, SMR (Zhu et al., 2016) revealed a total of 10 gene-tissue pairs (three unique genes, namely, RSPO2, PLEC, and RP11-967K21.1) that surpassed the genome-wide significance level (P SMR < 8.4 × 10 -6 ) without heterogeneity (P HEIDI < 0.05) (Supplementary file 8). RSPO2 is located in the most significant locus in GWAS meta-analysis, and its functions related to OPLL were elucidated in a past study (Nakajima et al., 2016). PLEC is expressed in various tissues, including muscles and fibroblasts (Consortium, 2015), and PLEC deficiency causes epidermolysis bullosa simplex with muscular dystrophy (OMIM 226670) (Smith et al., 1996), in which osteoporosis frequently develops (Chen et al., 2019). Since increased expression of PLEC was estimated to have a causal effect on OPLL (Figure 1-figure supplement 5, Supplementary file 8), these results suggest that PLEC is a likely causal gene of OPLL. As for RP11-967K21.1, its function in OPLL development is currently unknown and is expected to be elucidated in future studies.

Cell groups and cell types related to OPLL
We conducted partitioning heritability enrichment analyses to investigate cell groups related to OPLL. We observed significant enrichment in the active enhancers of the connective/bone cell group and the immune/hematopoietic cell group (Supplementary file 9). We then analyzed each cell type belonging to these groups and found significant enrichment of H3K27ac in chondrogenic differentiation cells (Supplementary file 10). These results concord with previous findings that in OPLL chondrocyte differentiation in the endochondral ossification process occurs (Sugita et al., 2013) and provide new insights into the involvement of immune system cells in OPLL development, which has received little attention to date.

Subtype analyses of OPLL
Subtype-stratified GWAS meta-analyses were also conducted: cervical (C)-OPLL (820 cases and 14,576 controls) and thoracic (T)-OPLL (651 cases and 20,007 controls). Subsequently, we identified three significant loci for C-OPLL and nine significant loci for T-OPLL (Figure 1-figure supplements 6-9, Supplementary file 11). Of these loci, one in the C-OPLL analysis and nine in the T-OPLL analysis were not identified in the analysis of ALL-OPLL and other OPLL subtypes. However, most of the lead SNPs in these significant loci were rare variants. We cannot determine that these are the causal variants based on the present results alone, but there was an interesting variant among them. rs74707424, a leading SNP in the significant locus (19p12), is located in the 3′-untranslated region of the ZBTB40 gene. In a recent study using primary osteoblasts of mouse calvaria, Doolittle et al. reported that Zbtb40 functions as a regulator of osteoblast activity and bone mass, and knockdown of Zbtb40, but not Wnt4, in osteoblasts drastically reduced mineralization (Doolittle et al., 2020). We did not find significant genes in the gene-based analysis (data not shown).

Expression of candidate genes in the spinal ligament in humans and mice
We then examined the expression of 23 genes of interest (ALL-OPLL GWAS: 19 genes; gene-based analysis: 2 genes; and SMR: 2 genes) in the spinal ligament, a target tissue of OPLL (see Supplementary file 12 for detailed information on the number of genes). We used the deposited RNAsequencing (RNA-seq) data in the spinal ligament (yellow ligament) in patients with OPLL and cervical spondylotic myelopathy (CSM) (GSE188760), and chondrogenic differentiation of human spinal ligament cells and controls (GSE188759) (Tachibana et al., 2022). Both data sets included 20/23 genes, of which we found 14/20 (70%) and 15/20 (75%) expressed in spinal ligament tissue in GSE188760 and GSE188759, respectively. In addition, the expression tended to be different between OPLL patients and CSM patients. Especially, the expressions of WWP2, EIF3H, and SNX17 showed nominally significant differences (see 'Materials and methods' and Figure 1-figure supplement 10). WWP2 was more highly expressed in chondrogenic differentiated ligament cells than in undifferentiated ligament cells (see 'Materials and methods' and Figure 1-figure supplement 11). The expression of WWP2 has a positive effect on bone formation (Scheideler et al., 2008). We should revalidate these results with larger data sets in the future.
Next, to further explore expressions of these genes in single-cell levels, we used deposited singlecell RNA sequencing (scRNAseq) data of Achilles tendon cells in murine ossification models: burn/ tenotomy heterotopic ossification model (GSE126060) (Sorkin et al., 2020) and Achilles tendon puncture model (GSE188758) (Tachibana et al., 2022). The Uniform Manifold Approximation and Projection (UMAP) identified 13 and 9 clusters in GSE126060 and GSE188758, respectively (Figure 1-figure supplements [12][13][14][15]. Both data sets contained information on the same 14/23 genes. We confirmed that 12/14 (85.7%) of these genes are expressed in both mesenchymal and immune-related cells (macrophage, dendritic cell, and lymphocyte) in both data sets. These results are concordant with the results of partitioning heritability analysis, suggesting that not only the mesenchymal cells which differentiate into ligament and chondrocyte cells but also the immune cells are involved with ligament ossification.
We also conducted the same analyses for the candidate genes uniquely found in T-and C-OPLL and found the expression of most of the genes in ligamentous tissues.

Causality of high BMI on OPLL
Epidemiological studies have suggested a relationship between OPLL and various other diseases and traits (Akune et al., 2001;Endo et al., 2020;Kawaguchi et al., 2017;Kobashi et al., 2004), particularly with T2D (Akune et al., 2001;Kobashi et al., 2004). We investigated their relationship with OPLL using the GWAS data. We first calculated the genetic correlation between OPLL and 96 complex traits (mean number of around 130K) (Akiyama et al., 2019;Akiyama et al., 2017;Ishigaki et al., 2020;Kanai et al., 2018; see Supplementary file 13 for the traits analyzed). We found a positive genetic correlation between OPLL and BMI and T2D. The genetic correlation estimate (rg) was higher in the BMI group than in the T2D group. In addition, we identified new negative correlations between cerebral aneurysms (Figure 3, Supplementary file 13).
Next, we conducted a two-sample MR using summary data from GWASs (Akiyama et al., 2017;Bakker et al., 2020;Kemp et al., 2017;Spracklen et al., 2020) to assess the causal effects of these significant traits in genetic correlation analysis on OPLL (Evans and Davey Smith, 2015;Lawlor et al., 2008;Figure 4- figure supplement 1, Supplementary file 14). The result of genetic correlation analysis for osteoporosis barely did not reach the false discovery rate (FDR)-corrected significance level, but we included it in the MR evaluation because there have been several previous reports of a strong trend toward whole-body ossification in OPLL patients (Hukuda et al., 1983;Mori et al., 2016;Nishimura et al., 2018;Yoshii et al., 2019). In the analysis, we used BMD, the main diagnostic criteria item for osteoporosis. BMD in the spine may reflect artifacts from OPLL itself, but higher BMD was also reported in patients with OPLL in the femur and the radius, a non-weight-bearing bone (Sohn and Chung, 2013;Yamauchi et al., 1999).
The significant causal effect of increased BMI on ALL-OPLL was estimated using the inverse variance weighted (IVW) method and the weighted median method (Figure 4, Figure 4-figure supplement 2, Supplementary file 15). The average pleiotropic effect of the MR-Egger regression intercept was close to zero (MR-Egger intercept = 0.005, p=0.581), indicating no evidence of the influence of directional pleiotropy (Figure 4-figure supplement 2, Supplementary file 16). We also assessed potential bias in the MR with a leave-one-out analysis and funnel plots; however, we did not identify any obvious bias (Figure 4-figure supplement 3). In contrast, we could not find any causal effects of T2D on ALL-OPLL with any MR methods (              We performed a reverse-direction MR to evaluate the causality of OPLL on BMI, T2D, cerebral aneurysm, and BMD but did not find any significant causal effects on the four traits (Figure 4-figure  supplement 7, Supplementary files 17 and 18).

The large causal effect of high BMI and high BMD on T-OPLL
We estimated the causal effect of traits with a significant genetic correlation with OPLL subtypes. We found contrasting results between C-and T-OPLL. A significant causal effect of increased BMI on T-OPLL, but not on C-OPLL, was indicated by all four MR methods. All beta estimates on T-OPLL were greater than those in the analysis for ALL-OPLL (Figure 4, Figure 4-figure supplements 2  and 8, Supplementary file 15), suggesting that T-OPLL drove the causal effect of BMI on OPLL. The MR-Egger regression intercept was significantly negative, suggesting the existence of directional pleiotropy (Figure 4-figure supplement 8, Supplementary file 16); however, the results of other sensitivity analyses for robust causal inference (simple and weighted median methods) suggested its causality on T-OPLL (Figure 4, Figure 4-figure supplements 8 and 9, Supplementary file 15). As for BMD, a larger causal effect of increased BMD on T-OPLL compared to ALL-OPLL was also estimated (Figure 4, Figure 4-figure supplement 5, Supplementary file 15). We did not find any significant effects in T2D or cerebral aneurysms except for the results using the simple median method for cerebral aneurysms.

Additional MR for obesity-related traits
Based on the above MR results, we focused on the causal relationship between high BMI, that is obesity, and OPLL. First, we repeated MR using only Japanese BMI data (Akiyama et al., 2017) to estimate causal effects on OPLL as a sensitivity analysis. As a result, we confirmed the positive direction of the effect. Furthermore, the results were significant for T-OPLL, reinforcing the causality of BMI on T-OPLL (Supplementary file 19).
Next, we conducted additional MR for obesity-related traits. We used the data from the large GWAS meta-analyses for obesity-related traits from UK Biobank (UKBB) and Giant Consortium: BMI, waist-to-hip ratio (WHR), and WHR adjusted by BMI (WHRadjBMI) (Pulit et al., 2019;Supplementary file 20). Regarding causality from BMI to OPLL, all four MR methods showed significant causality for ALL-and T-OPLL, while only IVW method showed causality for C-OPLL (Figure 4-figure supplement 10, Supplementary file 21). Regarding causality from WHR to OPLL, 2/4 and 3/4 MR methods showed significant causality for ALL-and T-OPLL, respectively, but no significant results were obtained for C-OPLL by either method. For both traits, the magnitude of the causal effect size estimated by MR tended to be larger for C-, ALL-, and T-OPLL, in that order, suggesting a strong influence of obesity on the development of T-OPLL. On the other hand, no significant causal relationship was found in the MR on OPLL from WHRadjBMI, a surrogate index of abdominal adiposity. It suggests that systemic obesity, rather than a simple high percentage of abdominal fat, influences the development of OPLL.

The polygenic causal effect of high BMI on OPLL
In addition to the causal effects of significant variants with BMI on OPLL, we evaluated the shared polygenic architecture between BMI and OPLL. Moderate correlations were found between the effect sizes of the SNPs of ALL-and T-OPLL with BMI, especially in sets of SNPs with low p-values (p<0.0005), when calculating the correlation based on p-value in the BMI GWAS. We did not find such correlations when calculating the correlation based on the p-value in the OPLL GWAS (Figure 4-figure supplement 11). However, this difference was not observed in the sets of SNPs of any p-value groups in the C-OPLL analysis. These results support the theory that high BMI is one of the causal factors of OPLL, and its causal effect on OPLL is driven by that on T-OPLL.

Heterogeneity of impact of obesity inside OPLL subtypes
Finally, we generated a PRS of BMI and compared its effect on OPLL for ALL-OPLL, C-OPLL, and T-OPLL ( Figure 5-figure supplements 1 and 2). We found that BMI PRS could predict OPLL, especially T-OPLL ( Figure 5). Further, the effect sizes of BMI PRS on OPLL were all larger than that on T2D analyzed in a similar manner ('Materials and methods') (beta = 0.099, 95% CI 0.087-0.110), indicating that high BMI is a major cause of OPLL.
Comparing C-and T-OPLL, we found that the effect of BMI PRS was significantly greater in T-OPLL (p=0.016), even in a limited number of cases.

Validity of this study reinforced by the replication study
To assess the validity of our GWAS meta-analysis, we conducted an association analysis using replication data of additional 2105 individuals (212 T-OPLL cases and 1866 controls). As for ALL-OPLL, we confirmed that the direction of betas for the lead SNPs in most regions (13/14) is consistent between the GWAS meta-analysis and the replication study (p=1.83 × 10 -3 , binomial test). Thus, the validity of the ALL-OPLL signals was reinforced (Figure 1-figure supplement 16). As for T-OPLL, the concordant rate was a little lower (6/9 of the lead SNPs), explained by low power in the additional data set to detect associations of rare alleles.  Next, we re-analyzed MR with expanded results of OPLL ('Materials and methods'). The addition of new replication data further strengthened the significance of the causal effect from BMI and BMD to OPLL (Figure 4-figure supplement 12, Supplementary files 15 and 16). For T2D, the result was only nominally significant in the IVW method (p=4.86 × 10 -2 ), and the causal effect from T2D to OPLL was not certain. Results of cerebral aneurysms showed little change, indicating a little causal effect on OPLL. We performed the same replication analysis for MR using obesity-related traits data from UKBB and Giant consortium (Figure 4-figure supplement 13, Supplementary files 21 and 22). The causal relationship from them to OPLL was further reinforced for BMI and WHR. WHRadjBMI demonstrated comparable results.

Discussion
This study conducted a GWAS meta-analysis using 2010 OPLL cases and 20,006 controls and identified 14 significant loci, including 8 novel loci. Although many unknowns exist regarding the function of the six regions found in the past, functional analysis of chromosome 8 (q23.1), where RSPO2 is located, has progressed. rs374810, the most significant variant in the present and the past study, is located on the promoter region of RSPO2 in chondrocytes. Its risk allele (C allele) causes a loss of transcription factor C/EBPβ binding; therefore, RSPO2 expression is reduced. Finally, the Wnt-β-catenin signal that blocks chondrocyte differentiation of ligament MSCs is suppressed, which triggers OPLL generation (Nakajima et al., 2016). Some of the newly discovered OPLL-related signals are in or near genes such as TMEM135 and WWP2, which are associated with the activation of bone formation (Scheideler et al., 2008;Zhu et al., 2017). In addition, we identified six genes associated with OPLL using genebased analysis and an SMR: EIF3E, EMC2, TMEM135, PLEC, RSPO2, and RP11-967K21.1. RSPO2 and TMEM135 were also found in the GWAS meta-analysis, reinforcing the results. EIF3E is suspected to be involved in heterotopic ossification by regulating HIF signaling, which has an important role in osteogenesis (Wei et al., 2022). In addition, EIF3E is also the nearest gene to the lead variant within the significant locus in previous GWAS for heel bone mineral density: rs7815105; beta = -0.014 (same direction as this study); p=4.9 × 10 -11 . The PLEC mutation causes epidermolysis bullosa, in which osteoporosis is one of the main comorbidities. Although various causes have been reported to induce osteoporosis in this disease (Chen et al., 2019), the underlying biological mechanism by which PLEC mutations affect bone metabolism is unclear, and future studies are expected to elucidate this mechanism. Although the effects of EMC2 and RP11-967K21.1 on ligament and bone tissue have not been reported and are unknown, future studies may elucidate the mechanisms and reveal that they are the actual causal genes. Thus, our GWAS observations are compatible with the theory that OPLL is closely related to bone metabolism and develops through endochondral ossification (Sato et al., 2007;Sugita et al., 2013).
Partitioning heritability enrichment analyses by LD score regression identified that OPLL GWAS signals enrich not only the active enhancers of the connective/bone cell group but also those of the immune/hematopoietic cell group. It is well known that heterotopic ossification of ligaments and tendons, in which mesenchymal stem cells differentiate into osteochondrogenic cells rather than myocytes or tenocytes, is triggered by tissue damage due to trauma (Convente et al., 2018;Torossian et al., 2017). Tissue injury triggers a systematic inflammatory response with the mobilization of neutrophils, monocytes, and lymphocytes at various stages of inflammation. Monocytes/macrophages are involved in abnormal wound healing and influence the development of heterotopic ossification (Sorkin et al., 2020). This fact suggests that immune system cells are involved in the tissue repair process of the posterior longitudinal ligament tissue of the spine, which is the host that causes ossification in OPLL as well. In addition, gene expression analysis using scRNAseq data confirmed that the candidate genes discovered in this study were expressed not only in mesenchymal cells but also in immune cells.
We identified that OPLL was genetically correlated with other traits, positive for BMI and T2D and negative for cerebral aneurysms. High BMI (obesity) has been implicated in OPLL, and clinical studies have reported that the BMI of OPLL patients is significantly higher than that of non-patients (Endo et al., 2020;Kobashi et al., 2004). It is speculated that obesity promotes heterotopic bone formation in OPLL. Yamamoto et al. reported a high incidence of OPLL in the Zucker fatty rat, a model rat of obesity with a loss-of-function mutation in the leptin receptor gene (Yamamoto et al., 2004). However, basic studies with small animals have reported conflicting results regarding the effects of obesity on bone formation. In general, high-fat diet (HFD)-induced obesity is detrimental to bone formation and results in low bone density in mice . On the other hand, Lv et al. reported that the mRNA level of Runx2 in bone marrow-derived mesenchymal stem cells from HFD mice was significantly higher; in contrast, that of Pparγ, which suppresses osteoblast differentiation and promotes osteoclast differentiation, was significantly lower than the control mice (Lv et al., 2010). Thus, the results of basic experiments are varied and controversial. While the effects of obesity on systemic bone formation and heterotopic ossification may not be the same, in clinical studies, meta-analyses examining the association between obesity and BMD in adults have reported that obesity affects the increase in bone density in all groups: postmenopausal women, premenopausal women, and men. In addition, MR analysis of the relationship between BMI and BMD reported that increased BMI was positively causally related to heel BMD (Ma et al., 2021;Song et al., 2020) and lumbar BMD (Song et al., 2020). Thus, while many results support the theory that obesity is related to OPLL formation, the mechanism by which obesity induces OPLL remains unclear. Our MR studies have demonstrated that a high BMI has a causal effect on OPLL. However, we could not prove the SNP-obesity interaction (Appendix 2). Further mechanistic studies on obesity and OPLL are necessary.
We found that the causal relationship between T2D and OPLL is slight by our MR analysis, despite the positive genetic correlation between OPLL and T2D. Many OPLL studies have focused on the relationship with T2D (Akune et al., 2001;Kobashi et al., 2004). Insulin acts on endogenous tyrosine kinase receptors and receptors for insulin-like growth factor-I, a potent anabolic factor in bone formation (Giustina et al., 2008;Locatelli and Bianchi, 2014). Therefore, it has been postulated that increased insulin production due to impaired insulin action may stimulate osteogenic cells in ligaments and cause OPLL (Akune et al., 2001). However, our results indicate that the impact of T2D on the development of OPLL is small. Therefore, most reasons for the high prevalence of T2D in OPLL patients reported so far can be attributed to the high prevalence of obesity in OPLL patients.
Our MR analysis also revealed the causality of high BMD in OPLL. This result is consistent with the clinical observation that OPLL patients have an increased tendency of ossification throughout the body and often have heterotopic ossification of other spinal ligaments, such as the anterior longitudinal ligament (Nishimura et al., 2018), interspinous ligament (Mori et al., 2016), and nuchal ligament (Yoshii et al., 2019), as well as extraspinal ossification lesions in the shoulder, hip, knee, and ankle joint (Hukuda et al., 1983). A gene set enrichment analysis identified significant enrichment in sets associated with BMD in children as well as adults. BMD is low at birth, increases gradually with age, peaks in the 20 s, and then slowly declines (Iglesias-Linares et al., 2016). Therefore, it is assumed that BMD in children reflects the degree of bone formation, although it is affected by various factors. Our results can support that OPLL is closely associated with bone formation. However, the relationship between cerebral aneurysms and OPLL has not been reported. Therefore, for evaluating the causality of cerebral aneurysms on OPLL, an MR analysis with more cerebral aneurysm-related SNPs as the instrumental variables is desirable. Such a study will be conducted in the future. In addition to these traits, additional analyses were performed for ankylosing spondylitis with ossifying lesions in the spine as well as OPLL. We evaluated the correlation between ankylosing spondylitis and OPLL but found no correlation in the limited data available (Appendix 3, Figure 1-figure supplement 17).
We demonstrated the differences in genetic characteristics of OPLL. We identified three significant loci for C-OPLL and nine significant loci for T-OPLL in the stratification analyses of the GWAS by OPLL subtypes. There was a considerable difference in the allele frequencies of lead variants in these loci between subtypes (Supplementary file 11). However, most of the alternative allele frequencies of these variants were small, and future confirmation by a study with a larger sample size is desirable. Furthermore, our MR and BMI PRS analyses showed that the effect of BMI on T-OPLL was much larger than that of C-and ALL-OPLL. The common disease study often shows clinically defined diseases based on common signs and symptoms are actually heterogeneous in the cause, such as hypertension and diabetes mellitus. Therefore, research focusing on disease subtypes is useful in characterizing the disease in detail and elucidating its specific causes, leading to more personalized treatment.
This study demonstrates that OPLL is a highly heritable disease. Although previous clinical studies have suggested that OPLL is heritable, there have been no studies that have mentioned the heritability of OPLL. This study is the first to clarify the high heritability of OPLL quantitatively. Further elucidation of the pathogenesis of OPLL from a genetic approach is expected.
Based on the results of this study, we expect to establish new non-surgical treatments. Although this study identified that OPLL is closely linked to bone metabolism, there are few studies examining the prognosis of OPLL with the use of bone-modifying agents such as bisphosphonates, which suppress both bone formation and resorption. A randomized controlled trial evaluated the effect of etidronate disodium on postoperative OPLL progression in patients with OPLL who had undergone posterior decompression surgery, but no significant effect was demonstrated (Yonenobu et al., 2006). Future studies are needed to determine whether existing agents are adequate or whether we need to develop new agents. Regarding obesity, the cause of OPLL, weight loss guidance, and aggressive bariatric treatment, including surgery, for some patients may be considered. Long-term prospective studies are needed to evaluate the effect of weight loss on OPLL suppression. In addition, this study suggests an involvement of the immune system in OPLL. Although most OPLL research to date has focused on bone metabolism, it is also essential to study the pathogenesis of OPLL from an immunological approach in the future to develop unprecedented therapies.
There are some limitations in this study. The first is the GWAS sample size. Although this study is the largest OPLL GWAS in the world and used samples collected from facilities throughout Japan, additional samples are needed to clarify the pathogenesis of OPLL further and better define subtype differences. In addition, OPLL GWAS in other ethnic groups has not been reported. We expect future international meta-analysis using data from non-Japanese ethnic groups to elucidate regional differences in the frequencies of OPLL. Second, gene expression data in spinal ligament tissue is scarce. Therefore, we were forced to use tissue data different from ligament tissue, such as GETx (Consortium, 2015), which did not include bone, cartilage, or ligamentous tissue, and data from a mouse Achilles tendon ossification model, which resembles spinal ligament ossification. We also used bulk data from spinal ligaments, but the sample sizes were limited. We focused on the genes closest to the GWAS signals, but some of these genes may not be the true causative genes because their expression was not confirmed or low in ligament tissues as far as we could ascertain from the limited data available. It is desirable to increase the number of spinal ligament tissue samples and perform expression analysis using scRNAseq. Additionally, we expect that functional data such as eQTL in ligament tissues or other responsible tissues will be available in the future. If so, genes other than those focused on in this study could be identified as causal genes. Another limitation is that the Japanese GWAS data used for genetic correlation analysis were limited to 96 traits, and other traits not analyzed for genetic correlation may be associated with OPLL. In addition, some of the data used for MR were not East Asian. While we cannot immediately improve upon the limitations listed above, we intend to strengthen these areas for future research for OPLL.
In conclusion, this study identified candidate genes in genomic loci associated with OPLL, and subsequent post-GWAS analyses showed a causal relationship between other traits and OPLL: obesity and high BMD. This study will serve as a basis for future research to elucidate the pathogenesis of OPLL in more detail and to develop new treatment methods.

Subjects
All the subjects analyzed in this study were Japanese. OPLL samples were collected from facilities throughout Japan. The GWAS data of this study consisted of three sets: GWAS set-1, -2, and -3. The case samples of set-1 and -2 were used as discovery and replication samples, respectively, in the previous GWAS (Nakajima et al., 2014). In these data sets, the cases had OPLL of more than or equal to two vertebrae. For the case of set-3, we recruited patients with OPLL in more than or equal to five vertebras or OPLL thicker than 5 mm in the thoracic spine in 2018-2019. When assessing the presence or absence of OPLL, expert spine surgeons in each institution examined patients' plain radiography or computed tomography (CT) in detail (Figure 1-figure supplement 1).
Regarding control data, we used genotyping data from BioBank Japan (BBJ) Nagai et al., 2017) in set-1 and -2, and those from the Medical Genome Center (MGC) Biobank database of the National Center for Geriatrics and Gerontology (NCGG) in set-3. Details of the characteristics of the subjects are shown in Supplementary file 1.
This study followed the Strengthening the Reporting of Genetic Association Studies (STREGA) reporting guideline (Little et al., 2009).

Study approval
All participating individuals provided written informed consent to participate in this study following approval by ethical committees at RIKEN Centers for Integrative Medical Sciences (approval ID: 17-16-39), Hokkaido University (approval ID: 16-059), and all other participating institutes.

Genotyping and quality control
Genomic DNA was extracted from peripheral venous blood samples using a standard method. We genotyped case and control samples using the Illumina OmniExpressExome BeadChip, a combination of Illumina OmniExpress BeadChip and Illumina HumanExome BeadChip, or Illumina Asian Screening Array (Supplementary file 1).
For quality control of genotyped SNPs, we excluded those with (i) SNP call rate < 99%, (ii) MAF < 0.01, and (iii) Hardy-Weinberg equilibrium p-value<1.00 × 10 -6 . We constructed a reference panel to obtain imputed genotypes with high accuracy using the 1000 Genomes Project Phase 3 (1KGP 3 [May 2013 n=2504]) and 3256 in-house Japanese whole-genome sequence data obtained from BBJ (JEWEL 3K) in the same way as previously reported (Akiyama et al., 2019). SNPs with allele frequency differences greater than 0.06 between the genotyped control data and the 1KGP3 East Asian and JEWEL 3K data in reference panel were excluded.
For sample quality control, we excluded samples whose sex differed between genotype and clinical data. We evaluated cryptic relatedness by calculating estimates of pairwise IBD (PI_HAT) and removed samples that showed second-degree relatedness or closer (PI_HAT > 0.25). Population stratification was estimated using principal component analysis (PCA) with four populations from HapMap data as the reference: European (CEU), African (YRI), Japanese (JPT), and Han Chinese (CHB) with SmartPCA (Patterson et al., 2006). We generated a scatterplot using the top two associated principal components (eigenvectors) and selected samples within the East Asian (JPT/CHB) cluster. We excluded samples with a genotyping call rate of <98% (Figure 1-figure supplement 1).

Phasing and genotype imputation
We performed pre-phasing with EAGLE (v2.4.1)  and SNP imputation with minimac4 (v1.0.0) (Das et al., 2016) using the reference panel mentioned above. After imputation, we used SNPs with an imputation quality of Rsq > 0.3 and MAF > 0.005 for the subsequent association study.

GWAS and meta-analysis
We performed an association analysis of autosomes of GWAS set-1, -2, and -3 independently. We performed a logistic regression analysis using PLINK2.0 (Purcell et al., 2007) with the top 10 principal components (PCs) as covariates assuming an additive model, and evaluated the association of each imputed SNP. We then meta-analyzed the three GWAS sets with an inverse variance method under a fixed effect model using METAL software . Regarding X chromosomes, we performed a logistic regression analysis in males and females separately for each GWAS set using PLINK2.0, with the top 10 PCs as covariates assuming an additive model. We then integrated the results of males and females in each GWAS using an inverse variance method under a fixed-effect model (Figure 1-figure supplement 1).
We estimated confounding biases derived from population stratification and cryptic relatedness using LD score regression using LD scores for the East Asian population .

Conditional association analysis
We used the distance-based approach to determine significant loci. We defined the SNP with the lowest p-value within each locus as the lead SNP. We defined an associated locus of a lead SNP as 1 Mb of its surrounding sequences in both directions. We extended the region to nearby significant variants and their 1 Mb surrounding sequences as far as a significant variant was contained in the defined region. In addition, we margined 1 Mb from significant variants at both ends. We performed a stepwise conditional meta-analysis to determine the independent association signals in the associated loci. We conducted conditional analyses of GWAS set 1-3 separately and integrated the results using a fixed-effects model with the inverse variance method. We repeated this process until the top associated variants fell below the locus-wide significance level (p<5.0 × 10 -6 ) in each stepwise procedure. As a result of this analysis, we identified two additional loci associated with OPLL (lead SNPs: rs35281060 and rs1038666). We determined the boundaries of these regions based on the estimated recombination rates from hg19/1000 Genomes Nov 2014 East Asian (Table 1). We confirmed that there are no SNPs outside each locus in LD (r 2 > 0.1) with SNPs that met genome-wide significance levels within the locus.
We calculated LD with the lead SNPs using whole-genome sequence data of 1KGP3 East Asian and JEWEL 3K by PLINK2.0 and produced regional association plots using Locuszoom (Pruim et al., 2010).

Statistical power analysis
We evaluated the statistical power of GWAS meta-analysis for ALL-OPLL using the genpwr package for R (Moore et al., 2019). We set the model to an additive model, with the type 1 error rate and statistical power set to 5 × 10 −8 and 0.8 or 0.5, respectively.

Annotation
We used ANNOVAR  and defined a gene with a lead SNP or, if not, a gene in the nearest vicinity of a lead SNP as a candidate gene for the region because of previous reports that the majority of noncording variants act on the closest gene (Fulco et al., 2019;Nasser et al., 2021).
To identify candidate causal variants in the eight novel loci, we annotated the SNPs that exceeded the threshold of significance (p<5.0 × 10 −8 ) and were in high LD (r 2 > 0.8) with lead variants newly identified in the GWAS meta-analysis. We explored the biological role of these variants using SNP annotation tools, including HaploReg (Ward and Kellis, 2012), RegulomeDB (Boyle et al., 2012), and ANNOVAR .

Subtype-stratified GWAS and meta-analysis
We performed subtype-stratified GWASs and meta-analyses for C-OPLL and T-OPLL in the same way as the analysis with all samples from set to 1-3 (ALL-OPLL). Cases in GWAS set-3 were all T-OPLL samples; therefore, we carefully re-examined patients' plain radiography or CT in set-1 and -2, where we defined C-OPLL case samples as those with OPLL limited to the cervical spine and defined T-OPLL as OPLL affecting more than two vertebrae in the thoracic spine (Figure 1-figure supplement 1). The detailed sample numbers are listed in Supplementary file 1.

Estimation of phenotypic variance
We estimated the heritability of OPLL using LDSC software . The variance explained by the variants was calculated based on a liability threshold model assuming the prevalence of OPLL to be 3.0% (Matsunaga and Sakou, 2012). Furthermore, the model assumed that subjects had a continuous risk score and that subjects whose scores exceeded a certain threshold developed OPLL.

Bayesian statistical fine-mapping analysis
To prioritize causal variants in OPLL susceptibility loci, we conducted a fine-mapping analysis using FINEMAP v1.3 software (Benner et al., 2016), using z-scores of GWAS meta-analysis for ALL-OPLL and LD matrices calculated by 1KGP3 EAS and JEWEL 3K data. We assumed one causal signal in the ±1 Mb region from both ends of significant variants at each significant locus. However, for 12p11 and 12p12, in which we identified significant secondary signals by a conditional analysis, we defined the range of the region referring to regional association plots (Figure 1-figure supplement 3). We calculated a PP in which each genetic variant was the true causal variant. Then, we ranked the candidate causal variants in descending order of their PPs and created a 95% credible set of causal variants by adding the PPs of the ordered variants until their cumulative PP reached 0.95.

Gene set enrichment analysis
FUMA is a web-based platform in which we can perform GWAS-related analyses such as gene-based analysis and gene set enrichment analysis (Watanabe et al., 2017). We conducted a gene set enrichment analysis using FUMA. Because variants often act on genes that are close in the distance (Fulco et al., 2019;Nasser et al., 2021), we selected genes on a distance basis using the following criteria and used them as input data: genes (i) located within 1 Mb and (ii) the five closest to the leading SNPs of each genome-wide significant locus.

Gene-based association analysis
To examine the combined effect of SNPs, we conducted gene-based association analysis using MAGMA (de Leeuw et al., 2015) implemented in FUMA (Watanabe et al., 2017). MAGMA uses input GWAS summary statistics to compute gene-based p-values. For this analysis, the gene-based p-value is computed for protein-coding genes by mapping SNPs to genes if SNPs are located within the genes. We set the gene window 2 kb upstream and 1 kb downstream from the genes to include regulatory elements and analyzed 19,933 genes. We used the default settings and LD information from East Asian ancestry subjects from 1KGP3 as a reference. We set the p-value threshold for the test to 5.0 × 10 -8 (not a gene-wide threshold).

eQTL analysis
We obtained transcript data from the Genotype-Tissue Expression (GTEx) v8 (Consortium, 2015). We examined eQTL data in all available tissues in GTEx to determine the association between gene expression and the leading SNPs within the genome-wide significant locus. We set the significance threshold for eQTL as an FDR < 0.05.

SMR
SMR is used to determine associations between genetically determined traits, such as gene expression and protein abundance, and a complex trait of interest, such as OPLL. This analysis is designed to test whether the effect size of an SNP on the phenotype is mediated by gene expression. We used SMR software (Zhu et al., 2016). We used OPLL summary statistics data and eQTL data obtained from GTEx v7 (Consortium, 2015), which is the same build as in this study (GRCh37). We evaluated heterogeneity in dependent instruments (HEIDI) using multiple SNPs in a cis-eQTL region to distinguish pleiotropy from the linkage. As previously reported, we set the threshold for the HEIDI test to 0.05 and the threshold for SMR to 8.4 × 10 -6 (Zhu et al., 2016).

Partitioning heritability enrichment analysis
We performed stratified LD score regression using 220 cell-type-specific annotations of four histone marks (H3K4me1, H3K4me3, H3K9ac, and H3K27ac) (Roadmap Epigenomics Consortium et al., 2015). We divided the 220 cell-type-specific annotations into 10 cell-type groups (10 in adrenal/ pancreas, 34 in central nervous system, 15 in cardiovascular, 6 in connective/bone, 44 in gastrointestinal, 67 in immune/hematopoietic, 5 in kidney, 6 in liver, 10 in skeletal muscle, and 23 in other). We assessed heritability enrichment in histone marks of 220 individual cell types and ten cell type groups, as described by Finucane et al., 2015. The regression analysis excluded variants within the major histocompatibility complex (MHC) region (chromosome 6: 25-34 Mb). We defined significant heritability enrichment as those with an FDR < 0.05.

Confirmation of gene expression in ligament tissue
We examined the gene expression in the tissue. We used the deposited RNA-seq data in spinal ligament (yellow ligament) in patients with OPLL and cervical CSM (GSE126060) and human ligament cells after chondrogenic differentiation and control (GSE188759) (Tachibana et al., 2022). The presence or absence of gene expression was confirmed by the transcripts per kilobase million (TPM) values in each tissue and cell in both data, followed by a comparison of the expression levels in the two groups by a t-test using R software.

scRNAseq data processing
We used available deposited data from two studies using Achilles tendon cells from an ossification model of the mouse Achilles tendon to investigate the expression levels of candidate causal genes in our study: burn/tenotomy heterotopic ossification model (GSE126060) (Sorkin et al., 2020) and Achilles tendon puncture model (GSE188758) (Tachibana et al., 2022). These data had already been processed into MTX format, and we conducted subsequent data analyses using the R package Seurat (Hao et al., 2021).
Regarding GSE126060 (Sorkin et al., 2020), we used data from day 0 and day 7 of the replicate 1-4 samples from the deposited data. First, we filtered out cells with less than 1000 genes per cell and a mitochondrial read content greater than 5%. After normalization using the NormalizeData function and identifying the independently variable 2000 features in each data set, we selected repeatedly variable features across data sets for data integration with the SelectIntegrationFeatures function. We selected anchors using the FindIntegrationAnchors function and used them to integrate the data sets using the IntegrateData function with the default parameter. Next, we scaled this integrated data using the ScaleData function and ran PCA. We then conducted dimensionality reduction with UMAP using the top 20 PCs. After computing k for the k-nearest neighbor algorithm using the top 20 PCs with the FindNeighbors function, we conducted the clustering with the FindClusters function. Next, we annotated the clusters based on the expression of marker genes in the cells comprising the ossified ligament: mesenchymal cell (Pdgfra, Prrx1, Clec3b, and Dpt), dendritic cell (Cd209a and Flt3), endothelial cell (Emcn, Pecam1, and Sox18), lymphocyte (Ccr7, Ms4a4b, and Ms4a1), neuromuscular (Pax7 and Ncam1), pericyte/smooth muscle (Abcc9, Rgs5, Acta2, Pdgfrb, and Des), macrophage (Lyz2, Cd14, and Cd68), and nerve (Sox10, Plp1, Mbp, and Mpz) (Tachibana et al., 2022). Finally, we investigated the expression levels of candidate genes found in GWAS meta-analyses, gene-based association analysis, and SMR in each cluster.
In addition, we also conducted an analysis using the data from GSE188758 (Tachibana et al., 2022). Since this is a single data set from five mice, we did not conduct data integration. Other than that, we processed the data in the same manner described above and performed clustering and sub-clustering on this data to evaluate the expression of the candidate genes found in our study.

Genetic correlation
We estimated the genetic correlations using a bivariate LD score regression  using recently published GWAS results for Japanese: 96 complex traits (Akiyama et al., 2019;Akiyama et al., 2017;Ishigaki et al., 2020;Kanai et al., 2018; see Supplementary file 13 for the traits analyzed). In these reports, GWASs were conducted for 42 diseases (Ishigaki et al., 2020), 58 quantitative traits (Kanai et al., 2018), BMI (Akiyama et al., 2017), and height (Akiyama et al., 2019) (total of 102 traits). We could not calculate the following six traits for genetic correlation with OPLL due to the small sample size: biliary tract cancer, endometriosis, hematological malignancy, interstitial lung disease, periodontal disease, and E/A ratio.
We excluded variants found in the MHC region from the analysis because of their complex LD structure. We set the significance threshold for genetic correlations as FDR < 0.05. We evaluated the genetic correlation only for ALL-OPLL because the sample sizes of the C-and T-OPLL groups were too small for this analysis.

MR
We applied two-sample MR methods that handle summary statistics from separate studies to evaluate the causality of BMI, T2D, cerebral aneurysm, and BMD on OPLL using the R package 'TwoSam-pleMR' (Hemani et al., 2018). Regarding BMI, we reconstructed trans-ancestral meta-analysis data using Japanese and European GWAS results in the same way as previously reported (Akiyama et al., 2017;Locke et al., 2015). Regarding T2D, cerebral aneurysm, and BMD, we used publicly available results of East Asian meta-analysis of GWASs for T2D (Spracklen et al., 2020), mainly European metaanalysis of GWAS for cerebral aneurysm (Bakker et al., 2020), and European GWAS for BMD (Kemp et al., 2017). In the SNP selection, we extracted the lead SNPs for each study as the instrumental variables. When the lead SNP was not present in the GWAS meta-analysis for OPLL, we selected proxy SNPs highly correlated with the original variants (r 2 > 0.8). If there were no proxy SNPs that met the criteria, we excluded the SNPs from the analysis. The details of the instrument variables in each MR are shown in Figure 4-figure supplement 1 and Supplementary file 14.
We conducted additional MR methods in addition to the conventional IVW method for sensitivity analyses: the MR-Egger method (Bowden et al., 2015;Burgess and Thompson, 2017) and the simple and weighted median methods (Bowden et al., 2016). In addition, we conducted subtypestratified analyses using summary statistics from C-and T-OPLL GWAS and examined the differences between OPLL subtypes. The number of SNPs used in each analysis is listed in Supplementary file 14. We also assessed the potential bias in the results of MR with leave-one-out analysis and funnel plots.
We performed reverse-direction MR using the lead SNPs in the significant locus in the meta-analysis for ALL-OPLL as instrumental variables. Parts of these OPLL-associated SNPs were not present within the BMI and cerebral aneurysm data sets, although all were contained in the dataset of T2D and BMD. Therefore, in the analyses for BMI and cerebral aneurysm, we substituted them with the proxy SNP in the same way described above (Figure 4-figure supplement 1).
A participant overlap between the samples used to estimate genetic associations with the exposure and the outcome in two-sample MR can bias results . Therefore, using exposure and outcome instrument variables estimated in non-overlapping samples is preferable. We checked the cohort data used in our MR and found no overlap with the OPLL case sample, although the control samples used in the OPLL study overlapped with up to 2.2% of samples used in the BMI study and 3.4% in the T2D study. According to a simulation study of the association between sample overlap and the degree of bias in instrumental variable analysis, an unbiased estimate is obtained if the overlapping sample includes only control samples for the binary outcome .

Additional MR for obesity-related traits
Based on the MR results for the above four traits, we used BMI data only for Japanese (Akiyama et al., 2017). We conducted MR to evaluate the causal effect of BMI on OPLL in the same way as above as a sensitivity analysis.
In addition, we conducted MR for obesity-related traits in the same manner as described above. We used the data from the large GWAS meta-analyses for obesity-related traits from UKBB and Giant Consortium: BMI, WHR, and WHRadjBMI (Pulit et al., 2019).

Comparison of the effect sizes of the SNPs between the GWAS metaanalyses for OPLL and BMI
After pruning the SNPs, we evaluated the correlations of the effect sizes of the SNPs between the GWAS meta-analyses for OPLL (ALL-, C-, and T-OPLL) and the GWAS for BMI for sets of SNPs stratified by the thresholds based on the GWAS p-values for each trait. We used the results of the GWAS meta-analysis of OPLL and the Japanese GWAS of BMI (Akiyama et al., 2017) for this analysis. First, we extracted SNPs with MAF ≥ 0.01, shared between the meta-analyses for OPLL and BMI. Next, we conducted LD pruning of the SNPs for the SNP pairs in LD (r 2 ≥ 0.5) using 1KGP3 East Asian and JEWEL 3K data by PLINK. Finally, we used 367,672 SNPs in subsequent analyses. We calculated the correlation of the effect sizes of the SNPs between the GWAS meta-analyses for OPLL and BMI GWAS for sets of SNPs stratified by the thresholds based on the GWAS p-values in each trait using R software.

Generation of PRS of BMI and its application to OPLL GWAS samples
We used PRS to investigate the genetic impact of BMI on OPLL. We constructed the PRS of BMI using a pruning and thresholding method (Khera et al., 2018;Figure 5-figure supplement 1). In the discovery phase, we generated PRS as the sum of risk alleles weighted by the log odds ratio of association estimated in the Japanese GWAS for BMI (Akiyama et al., 2017). We pruned SNPs based on nine different LD thresholds r 2 = 0.1-0.9 in a 250 kb window using PLINK2.0 and constructed 20 PRS using independent SNPs at p-value thresholds of 5.0 × 10 -8 ~ 1 for each LD threshold. In the validation phase, we determined the best pruning parameter with another Japanese dataset in which genotyping was conducted using the Illumina OmniExpressExome chip. We conducted data quality control in the same manner as in the OPLL GWAS. We used the calculated BMI PRS after standardization (mean = 0, standard deviation = 1). In the test phase, we calculated the Spearman's rho score between BMI and PRS to assess the fit of the models and determined r 2 = 0.9 and p-value = 0.6 as the best parameter. We applied BMI PRS for cases and controls in the present OPLL GWAS study.
We measured the association between BMI PRS and OPLL using logistic regression with principal components 1-10 as covariates for each OPLL dataset. Then, we meta-analyzed the effect sizes of BMI PRS in the three OPLL data sets using the inverse variance method under a fixed-effect model. We conducted this analysis for ALL-OPLL, C-, and T-OPLL. Furthermore, we applied BMI PRS for C-and T-OPLL cases and compared the effect of OPLL PRS on OPLL subtypes using logistic regression with principal components 1-10 as covariates.
To compare the effect size of BMI PRS in OPLL with T2D, for which BMI is known to be one of the major causes, we performed the same BMI PRS scoring on the T2D data set (39,758 cases and 111,487 controls). Unfortunately, the T2D data set overlaps almost entirely with the data from the discovery phase of the BMI PRS, which carries a high risk of statistical inflation due to model overfitting. However, since no other data was available for T2D, the comparison was made in this manner, knowing the above points.

Replication analysis for study validation
For replication cases, we extracted DNA from the peripheral blood of 230 OPLL patients, independent of the cases used in the GWAS meta-analysis described above. We genotyped them using Illumina Asian Screening Array. All case samples used here were of the T-OPLL subtype. For replication controls, we used genotyping data for 1889 Japanese individuals from the BBJ project genotyped with Illumina Asian Screening Array. Sample and SNP QC, phasing, and imputation were performed similarly to the GWAS meta-analysis described above. Finally, we conducted a logistic regression analysis using 212 case and 1866 control samples with the top 10 PCs as covariates to confirm the effect size of the lead SNP in each region in the GWAS meta-analysis. To validate the results of our GWAS meta-analysis, we performed a binomial test to assess the concordance in the direction of the betas in the original GWAS meta-analysis and the association analysis with the replication data.
In addition, for the SNPs used as instrumental variables in MR, we updated the SNP statistics by adding replication data to the data in sets 1-3 using meta-analysis with the inverse variance method under a fixed-effect model. Using these updated SNP statistics, we re-conducted MR and evaluated the changes in the results. Koike • Supplementary file 2. Independent signals identified by a conditional analysis.
• Supplementary file 4. Causal variants estimated by a Bayesian statistical fine-mapping analysis.
• Supplementary file 9. Enrichment analysis for active enhancer by cell groups.
• Supplementary file 10. Enrichment analysis for active enhancer by cell types.
• Supplementary file 12. Details of genes analyzed by gene expression analysis.
• Supplementary file 13. Genetic correlation between OPLL and other disease or quantitative trait.
• Supplementary file 14. Instrument variables to analyze the causal effect on OPLL.
• Supplementary file 15. MR analyses inferring causality of body mass index, type 2 diabetes, cerebral aneurysm, and bone mineral density on OPLL.
• Supplementary file 16. Estimates of Egger intercept in Mendelian randomization for four traits and OPLL.
• Supplementary file 17. Instrument variables in the reverse-direction Mendelian randomization analysis.
• Supplementary file 19. Results of Mandelian rondomization of body mass index and OPLL using Japanese data only.
• Supplementary file 20. Instrument variables used in additional Mendelian randomization for obesity related traits.
• Supplementary file 21. MR analyses inferring causality of the obesity-related traits on OPLL.
• Supplementary file 22. Estimates of Egger intercept in Mendelian randomization for obesityrelated traits and OPLL.

Data availability
Full GWAS results will be available after acceptance via the website of the Japanese ENcyclopedia of GEnetic associations by Riken (JENGER, http://jenger.riken.jp/en/).
The following dataset was generated: